Goto

Collaborating Authors

 log scale




News-Aware Direct Reinforcement Trading for Financial Markets

Lan, Qing-Yu, Wang, Zhan-He, Jiang, Jun-Qian, Wang, Yu-Tong, Piao, Yun-Song

arXiv.org Artificial Intelligence

The financial market is known to be highly sensitive to news. Therefore, effectively incorporating news data into quantitative trading remains an important challenge. Existing approaches typically rely on manually designed rules and/or handcrafted features. In this work, we directly use the news sentiment scores derived from large language models, together with raw price and volume data, as observable inputs for reinforcement learning. These inputs are processed by sequence models such as recurrent neural networks or Transformers to make end-to-end trading decisions. We conduct experiments using the cryptocurrency market as an example and evaluate two representative reinforcement learning algorithms, namely Double Deep Q-Network (DDQN) and Group Relative Policy Optimization (GRPO). The results demonstrate that our news-aware approach, which does not depend on handcrafted features or manually designed rules, can achieve performance superior to market benchmarks. We further highlight the critical role of time-series information in this process.


Gaussian Certified Unlearning in High Dimensions: A Hypothesis Testing Approach

Pandey, Aaradhya, Auddy, Arnab, Zou, Haolin, Maleki, Arian, Kulkarni, Sanjeev

arXiv.org Machine Learning

Machine unlearning seeks to efficiently remove the influence of selected data while preserving generalization. Significant progress has been made in low dimensions $(p \ll n)$, but high dimensions pose serious theoretical challenges as standard optimization assumptions of $Ω(1)$ strong convexity and $O(1)$ smoothness of the per-example loss $f$ rarely hold simultaneously in proportional regimes $(p\sim n)$. In this work, we introduce $\varepsilon$-Gaussian certifiability, a canonical and robust notion well-suited to high-dimensional regimes, that optimally captures a broad class of noise adding mechanisms. Then we theoretically analyze the performance of a widely used unlearning algorithm based on one step of the Newton method in the high-dimensional setting described above. Our analysis shows that a single Newton step, followed by a well-calibrated Gaussian noise, is sufficient to achieve both privacy and accuracy in this setting. This result stands in sharp contrast to the only prior work that analyzes machine unlearning in high dimensions \citet{zou2025certified}, which relaxes some of the standard optimization assumptions for high-dimensional applicability, but operates under the notion of $\varepsilon$-certifiability. That work concludes %that a single Newton step is insufficient even for removing a single data point, and that at least two steps are required to ensure both privacy and accuracy. Our result leads us to conclude that the discrepancy in the number of steps arises because of the sub optimality of the notion of $\varepsilon$-certifiability and its incompatibility with noise adding mechanisms, which $\varepsilon$-Gaussian certifiability is able to overcome optimally.


Estimating Jaccard Index with Missing Observations: A Matrix Calibration Approach

Wenye Li

Neural Information Processing Systems

The Jaccard index is a standard statistics for comparing the pairwise similarity between data samples. This paper investigates the problem of e stimating a Jaccard index matrix when there are missing observations in data sam ples. Starting from a Jaccard index matrix approximated from the incomplete dat a, our method calibrates the matrix to meet the requirement of positive semi-d efiniteness and other constraints, through a simple alternating projection algo rithm. Compared with conventional approaches that estimate the similarity matr ix based on the imputed data, our method has a strong advantage in that the calibrate d matrix is guaranteed to be closer to the unknown ground truth in the Frobenius norm than the un-calibrated matrix (except in special cases they are iden tical). We carried out a series of empirical experiments and the results confirmed ou r theoretical justification. The evaluation also reported significantly improved r esults in real learning tasks on benchmark datasets.






On the Origins of Sampling Bias: Implications on Fairness Measurement and Mitigation

Zhioua, Sami, Binkyte, Ruta, Ouni, Ayoub, Ktata, Farah Barika

arXiv.org Artificial Intelligence

Accurately measuring discrimination is crucial to faithfully assessing fairness of trained machine learning (ML) models. Any bias in measuring discrimination leads to either amplification or underestimation of the existing disparity. Several sources of bias exist and it is assumed that bias resulting from machine learning is born equally by different groups (e.g. females vs males, whites vs blacks, etc.). If, however, bias is born differently by different groups, it may exacerbate discrimination against specific sub-populations. Sampling bias, in particular, is inconsistently used in the literature to describe bias due to the sampling procedure. In this paper, we attempt to disambiguate this term by introducing clearly defined variants of sampling bias, namely, sample size bias (SSB) and underrepresentation bias (URB). Through an extensive set of experiments on benchmark datasets and using mainstream learning algorithms, we expose relevant observations in several model training scenarios. The observations are finally framed as actionable recommendations for practitioners.